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Abstract 

Current state-of-the-art discrete optimization methods struggle behind when it 
comes to challenging contrast-enhancing discrete energies (i.e., favoring differ- 
ent labels for neighboring variables). This work suggests a multiscale approach 
for these challenging problems. Deriving an algebraic representation allows us 
to coarsen any pair- wise energy using any interpolation in a principled algebraic 
manner. Furthermore, we propose an energy-aware interpolation operator that 
efficiently exposes the multiscale landscape of the energy yielding an effective 
coarse-to-fine optimization scheme. Results on challenging contrast-enhancing 
energies show significant improvement over state-of-the-art methods. 

1 Introduction 

We consider discrete pair- wise energies, defined over a (weighted) graph (V, S): 

E{L) = ^^iik)^ Yl ^i3-^{k^j) (1) 

where V is the set of variables and £ is the set of edges. The sought solution is a discrete vector: 
1/ G {1, . . . , /}^, with n variables each taking one of / possible labels, minimizing (1). 

Most energy instances of form (1) considered in the literature are smoothness preserving: that is, 
assigning neighboring variables to the same label costs less energy. Smoothness preserving energies 
include submodular [15], metric and semi-metric [ ] energies. State-of-the-art optimization algo- 
rithms (e.g., TRW-S [ ], large move [ ] and dual decomposition (DD) [ ]) handle smoothness 
preserving energies well yielding close to optimal results. However, when it comes to contrast- 
enhancing energies (i.e., favoring different labels for neighboring variables) existing algorithms 
provide poor approximations (see e.g., [i / , example 8.1], [ , §5.1]). For contrast-enhancing en- 
ergies the relaxation of TRW and DD is no longer tight and therefore they converge to a far from 
optimal solution. 

This work suggests a multiscale approach to the optimization of contrast-enhancing energies. 
Coarse-to-fine exploration of the solution space allows us to effectively avoid getting stuck in local 
minima. Our work makes two major contributions: (i) An algebraic representation of the energy 
allows for a principled derivation of the coarse scale energy using any linear coarse-to-fine inter- 
polation, (ii) An energy-aware method for computing the interpolation operator which efficiently 
exposes the multiscale landscape of the energy. 

Multiscale approaches for discrete optimization has been proposed in the past (e.g., ['^, 14, 6, 10, 12, 
^ ]). However, they focus mainly on accelerating the optimization process of smoothness preserving 
energies. Furthermore, these methods are usually restricted to a diadic coarsening of grid-based 
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energies, and suggest "ad-hoc" and heuristic derivation of the coarse-scale energy (e.g., [10, §3]). 
In contrast, our framework suggests a principled derivation of coarse scale energy using a novel 
energy-aware interpolation yielding low energy solutions. 

2 Multiscale Energy Pyramid 

Our algebraic representation requires the substitution of vector L in (1) with an equivalent binary 
matrix representation U G {0, 1}"^^^ The rows of U correspond to the variables, and the columns 
corresponds to labels: Ui^a = 1 iff variable i is labeled "a" (li = a). Expressing the energy (1) 
using U yields a quadratic representation: 

E{U) = Tr{DU^^WUVU^) (2) 



I 

s.t. /7G{0,ir^\ ^U,a = l (3) 



where W = {wij}, D e R^><^ s.t. Di/=ipi{a), and V G M^><^ s.t. Va/=(p{a,p), a,f3 e 
{1, . . . , An energy over n variables with / labels is now parameterized by {n^l^ D^W^V). 

Let (n-^, /, I)-^, VK-^, be the fine scale energy. We wish to generate a coarser representation 
{n^^l^ D^^W^^V) with fewer variables < . This representation approximates E (U^) us- 
ing fewer variables: with only rows. 

/ c 

An interpolation matrix P G [0, 1]"^ s.t. Pij = 1 Vz, maps coarse assignment to fine 
assignment PU^. For any fine assignment that can be approximated by a coarse assignment U^, i.e., 
[// = PU^, we can write eq. (2): 

E {U^) = Tr (^D^U^^ + W^U^VU^^^ = Tr (^D^V'^P^ + PU'^VU''^ P^^ (4) 

= Tr {P^D^) U""^ + {P^W^ P) U'^VU''^^ = Tr [d'^U''^ + WV'VU''^^ 



EiU^) 



We have generated a coarse energy E {U^) parameterized by (n^, l^D^^ W^^ V) that approximates 
the fine energy E{U^). This coarse energy is of the same form as the original energy allowing us to 
apply the coarsening procedure recursively to construct an energy pyramid. 

Our principled algebraic representation allows us to perform label coarsening in a similar manner. 

Looking at a different interpolation matrix P G [0, 1] , we interpolate a coarse solution by 

^ U^P^ . This time the interpolation matrix P acts on the labels, i.e., the columns of U. The 
coarse labeling matrix has the same number of rows (variables), but fewer columns (labels). 
Coarsening the labels yields: 

E {U^) = Tr [[d^p) + WU^ (^P^V^P^ U^^^ (5) 

Again, we end up with the same type of energy, but this time it is defined over a smaller number of 
discrete labels: (n, W, V^) , where D^=Dfp and V^=P^vfp. 

Equations (4) and (5) encapsulate one of our key contributions: Constructing an energy pyramid 
depends only on P. For any interpolation P it is straightforward to derive the coarse- scale energy 
in a principled manner. But what is an appropriate interpolation? 

3 Energy-aware Interpolation 

The effectiveness of the multiscale approximation of (4) and (5) heavily depends on the interpola- 
tion matrix P (P resp.). The matrix P can be interpreted as an operator that aggregates fine- scale 
variables into coarse ones (Fig. 1). Aggregating fine variables i and j into a coarser one excludes 
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from the search space all assignments for which li ^ Ij. This aggregation is undesired if assigning i 
and j to different labels yields low energy. However, when variables i and j are in agreement under 
the energy (i.e., assignments with li = Ij yield low energy), aggregating them together allows for 
efficient exploration of low energy assignments. A desired interpolation aggregates i and j when 
i and j are in agreement under the energy. 

To estimate these agreements we empirically 
generate several samples with relatively low 
energy, and measure the label agreement be- 
tween neighboring variables i and j in these 
samples. We use Iterated Conditional Modes 
(ICM) [ ] to obtain locally low energy assign- 
ments. This procedure may be interpreted as 
Gibbs sampling from the Gibbs distribution 
p {U) (X exp {-^E {U)) at the limit T ^ 
(i.e., the "zero-temperature" limit). Performing 
t = 10 ICM iterations with K = 10 ran^m 
restarts provides us with K samples {L^}^^^. 
The disagreement between neighboring vari- 
able i and j is estimated sls dij = -^"^i^Vik^ik, 

where if is the label of variable i in the k^^ sample. Their agreement is then given by Cij = 

exp (^—^^, with a cx max V. 

Using the variable agreements, Cij, we follow the Algebraic Multigrid (AMG) method of [ ] to first 
determine the set of coarse scale variables and then construct an interpolation matrix P that softly 
aggregates fine scale variables according to their agreement with the coarse ones. 

We begin by selecting a set of coarse representative variables C V-^, such that every variable in 
V'^\V^ is in agreement with V^. A variable i is considered in agreement with if Xl^^y^ Cij > 
P ^.^yf Cij. That is, every variable in is either in or is in agreement with other variables in 
V^, and thus well represented in the coarse scale. 

We perform this selection greedily and sequentially, starting with = adding i to if it is not 
yet in agreement with V^. The parameter /3 affects the coarsening rate, i.e., the ratio n^/n-^, smaller 
13 results in a lower ratio. 

At the end of this process we have a set of coarse representatives V^. The interpolation matrix P is 
then defined by: 

PuU) = \ 1 ieV',j = i (6) 
I otherwise 

Where is the coarse index of the variable whose fine index is j (in Fig. 1: 1(2) = 1 and 
J(3) = 2). 

We further prune rows of P leaving only S maximal entries. Each row is then normalized to sum to 
1. Throughout our experiments we use (3 = 0.2 and (5 = 3 for computing P. 

4 A Unified Discrete Multiscale Framework 

Given an energy {n^l, D^W^V) at scale s = 0, our framework first works fine-to-coarse to com- 
pute interpolation matrices {P^} that construct the "energy pyramid": {(n*, ^ W^^ V)}^^q g. 
Typically we reduce the number of variables by a factor of 2 between consecutive levels, resulting 
with less than 10 variables at the coarsest scale. Since there are very few degrees of freedom at the 
coarsest scale ICM^ is likely to obtain a low-energy coarse solution. Then, at each scale s the coarse 
solution is interpolated to a finer scale 5 — 1: U^~^ ^ P^U^. At the finer scale U^~^ serves 
as a good initialization for ICM (fractional solutions are rounded). These two steps of interpolation 
followed by refinement are repeated for all scales from coarse to fine. 

^Our framework is not restricted to ICM and may utilize other single-scale optimization algorithms. 
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Figure 1: Interpola- 
tion as soft variable 
aggregation: fine 
variables 1, 2, 3 and 4 
are softly aggregated 
into coarse variables 1 and 2. For example, fine 
variable 1 is a convex combination of .1 of 1 
and .3 of 2. Hard aggregation is a special case 
where P is a binary matrix. In that case each 
fine variable is influenced by exactly one coarse 
variable. 
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Table 1: Synthetic results: Showing 
percent of achieved energy value relative 
to the lower bound computed by TRW-S 
(closer to 100% is better) for ICM and 
TRW-S for varying strengths of the pair- 
wise term (X = 5,10,15, stronger 
harder to optimize.) 



Table 2: Co-clustering results: Baseline for com- 
parison are state-of-the-art results of[ ]. (a) We re- 
port our results as percent of the baseline: smaller 
is better, lower than 100% even outperforms state- 
of-the-art. (b) We also report the fraction of ener- 
gies for which our multiscale framework outperform 
state-of-the-art. 
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123.6% 
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(b) 


55.6% 


0.0% 


0.5% 


15 


127.1% 


135.8% 


138.3% 











Our energy-aware interpolation and ICM play complementary roles in this multiscale framework. 
ICM makes fine scale local refinements of a given labeling, while the energy-aware interpolation 
makes coarse grouping of variables to expose global behavior of the energy. In a sense, ICM is a 
discrete equivalent to the continuous Gauss-Seidel relaxation used in continuous domain multiscale 
schemes. 



5 Experimental Results 

We evaluated our multiscale framework on challenging contrast enhancing synthetic, as well as on 
co-clustering energies. We follow the protocol of [ ] that uses the lower bound as a baseline for 
comparing performance of different optimization methods on different energies. We report the ratio 
between the resulting energy and the lower bound (in percents), closer to 100% is better^. 

Synthetic: We begin with synthetic contrast- enhancing energies defined over a 4-connected grid 
graph of size 50 x 50 (n = 2500), and / = 5 labels. The unary term D ~ A/'(0, 1). The pair- 
wise term Va^ = VfSa ^ ^/ (0, 1) (Vaa = 0) and Wij = Wji ^ X - U {—1,1). The parameter A 
controls the relative strength of the pair- wise term, stronger (i.e., larger A) results with energies more 
difficult to optimize (see [ ]). The resulting synthetic energies are contrast-enhancing (since Wij 
may become negative). Table 1 shows results, averaged over 100 experiments. Using our multiscale 
framework to perform coarse-to-fine optimization of the energy yields significantly lower energies 
than single-scale methods used (ICM and TRW-S). 

Co-clustering (Correlation-Clustering): The problem of co-clustering addresses the matching of 
superpixels within and across frames in a video sequence. Following [ , §6.2], we treat co-clustering 
as a minimization of a discrete Potts energy adaptively adjusting the number of labels. The resulting 
energies are contrast-enhancing (with some Wij < 0), have no underlying regular grid, no data term, 
and are very challenging to optimize. We obtained 77 co-clustering energies, courtesy of [ ], used in 
their experiments. Table 2 compares our discrete multiscale framework to the state-of-the-art results 
of [ ] obtained by applying specially tailored convex relaxation method. Our multiscale framework 
improves state-of-the-art for this family of challenging energies and significantly outperforms TRW- 
S. 



6 Extensions 

It is rather straightforward to extend our framework to handle energies with different V for every 
pair (ijj). Moreover, higher order potentials can also be considered using the same algebraic repre- 
sentation. A detailed derivation may be found in [ ] . 
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